75 research outputs found
On clustering procedures and nonparametric mixture estimation
This paper deals with nonparametric estimation of conditional den-sities in
mixture models in the case when additional covariates are available. The
proposed approach consists of performing a prelim-inary clustering algorithm on
the additional covariates to guess the mixture component of each observation.
Conditional densities of the mixture model are then estimated using kernel
density estimates ap-plied separately to each cluster. We investigate the
expected L 1 -error of the resulting estimates and derive optimal rates of
convergence over classical nonparametric density classes provided the
clustering method is accurate. Performances of clustering algorithms are
measured by the maximal misclassification error. We obtain upper bounds of this
quantity for a single linkage hierarchical clustering algorithm. Lastly,
applications of the proposed method to mixture models involving elec-tricity
distribution data and simulated data are presented
Statistical analysis of -nearest neighbor collaborative recommendation
Collaborative recommendation is an information-filtering technique that
attempts to present information items that are likely of interest to an
Internet user. Traditionally, collaborative systems deal with situations with
two types of variables, users and items. In its most common form, the problem
is framed as trying to estimate ratings for items that have not yet been
consumed by a user. Despite wide-ranging literature, little is known about the
statistical properties of recommendation systems. In fact, no clear
probabilistic model even exists which would allow us to precisely describe the
mathematical forces driving collaborative filtering. To provide an initial
contribution to this, we propose to set out a general sequential stochastic
model for collaborative recommendation. We offer an in-depth analysis of the
so-called cosine-type nearest neighbor collaborative method, which is one of
the most widely used algorithms in collaborative filtering, and analyze its
asymptotic performance as the number of users grows. We establish consistency
of the procedure under mild assumptions on the model. Rates of convergence and
examples are also provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOS759 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Silicon nanowires as negative electrode for lithium-ion microbatteries
International audienceThe increasingly demand on secondary batteries with higher specific energy densities requires the replace- ment of the actual electrode materials. With a very high theoretical capacity (4200 mAh g−1 ) at low voltage, silicon is presented as a very interesting potential candidate as negative electrode for lithium-ion micro- batteries. For the first time, the electrochemical lithium alloying/de-alloying process is proven to occur, respectively, at 0.15 V/0.45 V vs. Li+ /Li with Si nanowires (SiNWs, 200-300 nm in diameter) synthesized by chemical vapour deposition. This new three-dimensional architecture material is well suited to accom- modate the expected large volume expansion due to the reversible formation of Li-Si alloys. At present, stable capacity over ten to twenty cycles is demonstrated. The storage capacity is shown to increase with the growth temperature by a factor 3 as the temperature varies from 525 to 575 ◦ C. These results, showing an attractive working potential and large storage capacities, open up a new promising field of research
Functional supervised classification with wavelets
International audienc
Optimal bandwidth selection for variable kernel density estimates
International audienceIt is well established that one can improve performance of kernel density estimates by varying the bandwidth with the location and/or the sample data at hand. Our interest in this paper is in the data-based selection of a variable bandwidth within an appropriate parameterized class of functions. We present an automatic selection procedure inspired by the combinatorial tools developed in Devroye and Lugosi (2001). It is shown that the expected L 1 error of the corresponding selected estimate is up to a given constant multiple of the best possible error plus an additive term which tends to zero under mild assumptions
Nonparametric Forecasting of the Manufacturing Output Growth with Firm-level Survey Data
A large majority of summary indicators derived from the individual responses to qualitative Business Tendency Surveys (which are mostly three-modality questions) result from standard aggregation and quantification methods. This is typically the case for the indicators called balances of opinion, which are currently used in short term analysis and considered by forecasters as explanatory variables in many models. In the present paper, we discuss a new statistical approach to forecast the manufacturing growth from firm-survey responses. We base our predictions on a forecasting algorithm inspired by the random forest regression method, which is known to enjoy good prediction properties. Our algorithm exploits the heterogeneity of the survey responses, works fast, is robust to noise and allows for the treatment of missing values. Starting from a real application on a French dataset related to the manufacturing sector, this procedure appears as a competitive method compared with traditional algorithms.Business Tendency Surveys, balance of opinion, short-term forecasting, manufactured production, k-nearest neighbor regression, random forecasts
Optimal L1 bandwidth selection for variable kernel density estimates
It is well-established that one can improve performance of kernel density estimates by varying the bandwidth with the location and/or the sample data at hand. Our interest in this paper is in the data-based selection of a variable bandwidth within an appropriate parameterized class of functions. We present an automatic selection procedure inspired by the combinatorial tools developed in Devroye and Lugosi [2001. Combinatorial Methods in Density Estimation. Springer, New York]. It is shown that the expected L1 error of the corresponding selected estimate is up to a given constant multiple of the best possible error plus an additive term which tends to zero under mild assumptions.Variable kernel estimate Nonparametric estimation Partition Shatter coefficient
- …